Running Jupyter Notebook on Google Cloud for a Kaggle challenge

When you start a Kaggle challenge, a computer is usually needed to hold all dataset in the memory and accelerate the training with your GPU. Rather than purchasing a new computer, I'd like to do it free with 300$ credit offered by Google Cloud Platform.

Step 1: Create a free account in Google Cloud

For this step, you can create a new Google Account or sign in with your Google Account on https://cloud.google.com/. Then, you will have to put your payment information and verify your account.

Step 2 : Create a new project on GCP

Click on the three dots shown in the image below and then click on the + sign to create a new project.

Step 3 : Create a VM instance

Click on the three lines on the upper left corner, then on the compute option, click on ‘Compute Engine’

Now click on ‘Create new instance’. Name your instance, select a zone close to you, in my case, I chose 'europe-west1-b' .Choose your ‘machine type’. (I chose 8v CPUs 52 GB memory because i had a huge dataset). GCP will give you a estimated price according to your configurations.

You can also customize your vitual machine if you need GPUs. Attention, GPUs are available only in several zones. So , make sure that you have chosen a zone from below:

  • us-west1-b
  • us-central1-c
  • us-central1-f
  • us-east1-c
  • europe-west1-b
  • europe-west1-d
  • asia-east1-a
  • asia-east1-c
  • europe-west4-a

Under Boot Disk option, select your os as ‘Ubuntu 16.04 LTS’ and your disk size as what you need for your datasets, for example, I need 50 GB.

Under the firewall options tick both ‘http’ and ‘https’ (very important). Then, choose the disk tab and untick ‘ Delete boot disk when instance is deleted’.

Now click on ‘Create’ and your instance is ready!

IMPORTANT : DON’T FORGET TO STOP YOUR GPU INSTANCE AFTER YOU ARE DONE BY CLICKING ON THE THREE DOTS ON THE IMAGE ABOVE AND SELECTING STOP. OTHERWISE GCP WILL KEEP CHARGING YOU ON AN HOURLY BASIS.

Step 4: Make external IP address as static

By default, the external IP address is dynamic and we need to make it static to make our life easier. Click on the three horizontal lines on top left and then under networking, click on VPC network and then External IP addresses.

Change the type from Ephemeral to Static.

Now, click on the ‘Firewall rules’ setting under VPC network. Click on ‘Create Firewall Rules’ and refer the below image:

Under protocols and ports you can choose any port. I have chosen tcp:5000 as my port number. Now click on the save button.

Step 5 : Install Google Cloud SDK

According to your OS, refer the corresponding document on https://cloud.google.com/sdk/docs/quickstarts

Then run gcloud init follow steps on yhe website to initialize your Google Cloud SDK.

Step 6 : Install Jupyter notebook and other packages

Open a terminal, connect to your VM instance:

gcloud compute --project <project name> ssh --zone <zone name> <instance name>

Then, install anaconda3 into your VM,

wget http://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh	

bash Anaconda3-5.1.0-Linux-x86_64.sh

and follow the on-screen instructions. The defaults usually work fine, but answer yes to the last question about prepending the install location to PATH:

Do you wish the installer to prepend the 
Anaconda3 install location to PATH 
in your /home/haroldsoh/.bashrc ? 
[yes|no][no] >>> yes

To make use of Anaconda right away, source your bashrc:

source ~/.bashrc

Now, install other softwares, for example

conda install -c conda-forge lightgbm 

Step 7: Set up the VM server

Open up a SSH session to your VM. Check if you have a Jupyter configuration file:

ls ~/.jupyter/jupyter_notebook_config.py

If it doesn’t exist, create one:

jupyter notebook --generate-config

We’re going to add a few lines to your Jupyter configuration file; the file is plain text so, you can do this via your favorite editor (e.g., vim, emacs). Make sure you replace the port number with the one you allowed firewall access to in step 4.

On Mac, a folder name that starts by a dot is a hidden floder. In a text editor, to view hidden files and folders in the Open/Save dialog, just press Command+Shift+Period (that’s the . key).

c = get_config()
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = <Port Number>

It should look something like this :

Step 8 : Launching Jupyter Notebook

To run the jupyter notebook, just type the following command in the ssh window you are in :

jupyter notebook --ip=0.0.0.0 --port=<port-number> --no-browser &

Once you run the command, it gives you a token like this:

Now to launch your jupyter notebook, just type the following in your browser :

http://<External Static IP Address>:<Port Number>

where, external ip address is the ip address which we made static and port number is the one which we allowed firewall access to.

Enter the token you got in the last step:

Then you have a jupyter notebook running on GCP.

References

  1. Running Jupyter Notebook on Google Cloud Platform in 15 min

  2. Google Cloud Quickstarts